Background: Motif analysis methods have long been central for studying biological function of nucleotide\nsequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked\nby an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif\ndiscovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be\nincluded. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated\nby biological questions and hypotheses rather than acting as a screen based motif finding tool.\nMethods: We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented\nmotifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs\nand embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions\nacross ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based\nstatistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially\nlarge-scale motif analysis problems.\nResults: We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA\ntransfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a\nspecific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool\nwith an existing motif discovery tool and show increased sensitivity.\nConclusions: Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional\ngenomics. The method is available as an R package (https ://githu b.com/muhli gs/regme x).
Loading....